Norbu project report

Project introduction

This report presents the analysis of the usage data for Norbu app. The data were accessed and downloaded from Google Cloud and consists of information about user events from 2021-04-04 to 2021-05-31.

The key goal of Norbu company is to reach 20% for the 28 day user retention rate. The aims of the analysis in this report is to examine the company's current status and performance against a range of parameters in order to identify areas for imporvement in order to reach that goal.

To this end, we examined the app performance in the following aspects:

In what follows, we will skip the data importing and preprocessing step and go straight into analysis results. All datasets are in the github repository if you want to download and follow along the analysis (just change the file path).

You can also skip the analysis and just read Conclusion.

1. User retention

The company's main goal is to improve 28 day retention rate. In this section, we will take a look at the overall retention rate, the retention rate by location, and the retention rate by platform.

1.1 Overall retention

There are a total of 78108 unique users in our data. we will use 'first_touch_datetime' to group users into cohorts.

There are a total of 78108 unique users, disregarding the possibility that some users might have uninstalled and then installed the app. Information in the 'first_touch_datetime' could be used to put users into cohorts.

it's a little strange to see some first_touch_datetime date all the way back to the 70's, and date forward all the way to 2021-06-08, which is actually ahead of the event datetime. It would be useful to check why this is the case.

For now, given that the most frequent first_touch_datetime is '2020-11-17', let's only keep those users who started after that. Before we slice the data, we need to see how many we will lose.

Okay. It looks like about 2% of data will be excluded. It's not bad. We will go ahead and continue.

Given the data we have ranges slightly under 2 months, and the company's goal is to increase 28 day retention rate, let's divide the users into weekly cohorts adn track the cohort's retention rate by week. To this end, we will need a first_event_week and event_week.

Given the first event_date in the log starts on 2021-04-04, we won't be able to find out the cohort size of the earlier cohorts. Therefore, what we can do here is to use the number of unique users in the earliest cohort_lifetime on record as their cohort users. knowing that the number of active users later in the cohort lifetime can actually exceed the 'cohort_users' this way.

Because not all the cohorts in the data have cohort 0, therefore we will take the earliest cohort's users' count as the initial users. For those who have 0 cohort lifetime, this will be the cohort size of the 0 lifetime cohort, whereas for others, it will be the size of their earliest cohort.

For the last cohort, 2021-05-31, it is possible that the data is not complete at the time of the collection. Next, let's merge this with the cohorts dataset.

Given the span of the cohorts, for the purpose of this report, we will also present the cohort since 2021-04-05.

Disregarding the last week in May, we can see a overall decline of cohort size. April 12th and 19th 2021 seem to have the largest cohort size.

Despite the initial cohort size differ slightly, by the time it's their 4th cohort lifetime, on average it has dropped to 8%, far from what the company is hoping to achieve, 20%.

Next, let's take a look at retention rate by country and by platform. We will only be looking at those whose first week events are after 2021-04-05.

1.2 Retention by region

First let's find out which countries have the most norbu app users.

Okay. Iran really stands out as having the most app users.

Iran alone accounts for almost 50% of the users and Germany comes next. In what follows, we can divide the users into three groups, those from Iran, from Germany, and from other countries to calculate the retention rates.

let's define a function that take dataframe as the input and the rention pivot table for the period of 2021-04-05 onwards as output.

At 4th week, the retention rate on average among Iranian users is about 9%.

At 4th week, the retention rate on average among Iranian users is about 8.5%.

Essentially, the Iranian and German users show retention rate similar to the overall retention rate.

1.3 retention rate by platform

On average, by the 4th week, Android users' retention rate is about 8.8%, whereas IOS user' retention rate is about 4.9%.

There is a big different between the platform users. Is it possible that the app is more android friendly?

1.4 retention rate by traffic medium

Let's divide the user into two groups, those are from organic sources, and those are not.

By the 4th lifetime week, there is a slightly higher retention rate among users coming from organic sources (8.7%) than from non-organic sources (7.7%)

2. User LTV

Let's first address the task of calculating user LTV. The dataset made it easy for us because there is already a LTV column. We will use it to perform the following analysis.

Given the ltv in the dataset is accumulative for each user. The most recent(last) ltv is the ltv for each user.

Okay. From the above we can see that only 441 unique users have values in ltv column, half of whom has an LTV under 5 US dollars. By 2021-5-31, the total lifetime values by users are about 3910 US dollars.

(note: here the result is based on events after filtering out the 'first_touch_time' before 2020-11-17. Results may chang slightly if the filtering is not done. )

Although Iran seems to have the most users, Russia tops the list in terms of its total user ltv. Germany comes second. It's interesting to notice that Iran is not among the top 10 countries as far as user ltv is concerned.

Most of the users are android users, so it makes sense that they generate most of the ltv as well. However, it's intersting to see IOS users have a higher average user ltv.

Organic users contribute to majority of the user ltv. Unknown source users seem to have the highest average user ltv. It would be interesting to find out why these users sources are unknown.

Overall, in this section, we take a look at user ltv. One of the tasks from the company is what affects user ltv. At this stage, it's not certain that this information can be generated from the data.

3. Top user events

Let's take a look at what are the most popular user events.

From the above, we can see that app home screen, as we would expect, is the most visited session. The training details screen, breathe homescreen, ball game homescreen and stress management home screen also seem to be popular among users. In what follows, we will be looking at these one by one.

Before we move on to analyze individual events, we will need to join the 2 datasets so we have both event parameters and user information in the same data.

4. A user journey function

Let's define a function here to plot user journey. The parameters to use when using the functions are: the events data, the session that is decided to be the first session, how many steps the journey will track for, how many parallel paths are to keep (the rest will be grouped as other).

5. User stress assessment

From above, we can see that users arrived at the screen where they are asked to rate their stress 169,754 times.

Among those, a total of 22913 results sessions have been activiated which have a before and after stress level self evaluation. This represents approximately 13% of the total sessions at home screen of stress level.

Next, let's extract the results from the event parameters.

Next, let's take a look at the user before and after stress level self assessment.

As shown in this section, from scr_stress_level to result_session, approximately 13% of the time (22913/169754) users actually start and complete a training session and trigger a before and after stress evaluation result.

For completed user sessions, we can see that most of these result in reduced stress level, at approximately 78.8%.

A quick look at the median before session stress level by country for the top 10 countries with the most users shows the the values ranging between 4 to 6. Russian has the lowest self assessed before stress level.

6. Survey analysis

Lets' next plot a funnel from survey homescreen to end of survey.

From the above, we can see that there are 68529 sessions on the survey screen, among those, 62941 progress to start the survey, and 47851 completed the survey to have generated a score against the 6 domains.

Let's take a closer look at the survey results.

In the event_param, the dictionaries with domain1 to domain 6 as keys are where the survey results are stored.

Most of users only completed the survey once during the close to 2 month data period.

Let's find out how users' sleep norm score has changed in their surveys

The above shows the countries with the lowest median sleep norm score in the survey. However, these seem to be the countries that also don't have many app users.

The above shows the score distribution for sleep norm.

The above shows the top 10 countries in terms of the number of unique users who participated in the survey. The median score for sleep norm interesting is all 6.

7. App removal

Let's take a look at when users remove the app

From above we can see that 75% of users remove the app at their 4th session. Barely any of them exceeds a total 20 sessions.

All of those users are using an ANDROID device.

Top countries where those user come from resemble the ranking of users total in general.

There are 37545 unique users who have removed the app, 140 short than the total removals. This means there are users who remove the app more than once but still have the same user_pseudo_id.Next let's take a look at how long did the users stay with the app before the removal.

More than half of the users removed the app within 10 days, 75% removed within 50 days.

Next, Let's get the user_pseudo_id's of hte users who removed the app have a look at their usage journey.

Almost half of the users in our data removed the app.

9. Training unlock analysis

There are a total of 6 trainings that users can unlock for five days.

qh0 training has the highest session start, at 14957. qh6 training has the highest finish rate, at 2.99%

We can see overall, there is a higher purchase rate among those who finished the training than those who haven't.

9. In app purchase

There are two in_app_purchase events in the data. We will use the norbu_in_app_purchase event as this one seems to be more complete.

Among those sessions that arrive at the home screen for premium purchase, 2487 actually resulted in a purchase.

It seems that 75% of the purchases occur in the 5th session or before.

The most purchased product is the premium package, followed by norbu_mounth, possibly a typo here. qh3 seems to be the most popular training that has been purchased.

10. Meditation

Approximately 56% of the users who were at the meditation homescreen completed the meditation training. Let's take a look at where others have gone next.

11. Game

45% of the sessions at ball game home screen actually went through to complete the game. Let's take a look at how the sessions divert.

12. User sessions

In this section, let's have a quick look at how many sessions users usually had during the period of the data collection.

Half of the users have at most 3 sessions, 75% of the users have 5 sessions, with some extreme outliers extending to be beyond 100.

13. Conclusion

Overall, the following issues have been noticed from the analysis:

Suggestion

In brief, users need to be told what to do. People nowadays don't think much when using apps and quite often just mindlessly scroll up and down and click on the shinniest button that catches their attention. In order to keep users, the app needs to give exactly what they need and make users feel the program is personalized for them, whether this is true or not. Imagine users open the screen and see this:

                                  Hello Xia.
                                  How are you feeling at the moment? (options)
                                  Here is the best exercise for you today. It only takes 8 minutes. 
                                        ....
                                  Well done for taking the time to look after yourself. 
                                  If you would like to do to do anotehr exercise, click continue.
                                  If you are ready to go on with your day, click close. 
                                  Thank you and see you next time